Bayes Multiple Decision Functions 



Wensong Wu 
Department of Mathematics and Statistics 
Florida International University, Miami, FL, 33199 
wensong . wuOf iu . edu 

Edsel A. Pena 
Department of Statistics 
University of South Carolina, Columbia, SC 29208 
penaOstat . sc . edu 

October 4, 2011 



Abstract 



This paper deals with the problem of simultaneously making many (M) binary decisions 
based on one realization of a random data matrix X. M is typically large and X will usu- 
ally have M rows associated with each of the M decisions to make, but for each row the 
data may be low dimensional. Such problems arise in many practical arenas such as in the 
biological and medical sciences, where the available dataset is from microarrays or other 
high-throughput technology and with the goal being to decide which among of many genes 
are relevant with respect to some phenotype of interest; in the engineering and reliability sci- 
ences; in astronomy; in education; and in business. A Bayesian decision-theoretic approach 
for this problem is implemented with the overall loss function being a cost-weighted linear 
combination of Type I and Type II loss functions. The class of loss functions considered 
allows for the use of the false discovery rate (FDR), false nondiscovery rate (FNR), and 
missed discovery rate (MDR) in assessing the decision. Through this Bayesian paradigm, 
the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain 
the optimal Bayes action is described. In contrast to many works in the literature where the 
rows of the matrix X are assumed to be stochastically independent, we allow in this paper 
a dependent data structure with the associations obtained through a class of frailty-induced 
Archimedean copulas. In particular, non-Gaussian dependent data structure, which is the 
norm rather than the exception when dealing with failure-time data, can be entertained. 
The numerical implementation of the determination of the Bayes optimal action is facili- 
tated through sequential Monte Carlo techniques. The main theory developed could also be 
extended to the problem of multiple hypotheses testing, multiple classification and predic- 
tion, and high-dimensional variable selection. The proposed procedure is illustrated for the 
simple versus simple and for the composite hypotheses setting via simulation studies. The 
procedure is also applied to a subset of a real microarray data set from a colon cancer study. 
Keywords: Archimedean Copula, Bayes framework. Decision-theoretic framework. False 
discovery proportion. False Discovery Rate, Frailty, Multiple testing. Sequential Monte Carlo. 
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1. INTRODUCTION 

The advent of computer- automated high-throughput data-gathering technology, epitomized 
by the microarray, has led to the generation of so-called "large M, small n" data sets, which 
are those characterized by a large number, M, of variables (hereon called genes for historical 
reasons), which are observed or measured on a relatively small number, n, of subjects or 
units. Examples of such data sets in different scientific fields could, for instance, be found 
in Hastie, Tibshirani and Friedman (2009) and Efron (2008). 

For such data sets a typical goal is to choose an action, am, associated with gene ni, from a 
set of possible actions. Am, for each of the M genes. For example, in a two-group microarray 
data set, one may want to decide, for each gene, whether it is differentially expressed between 
the two groups (action is a = 1), or whether it is not differentially expressed between the 
two groups (action is a = 0). This situation corresponds to the problem of simultaneously 
testing multiple pairs of null and alternative hypotheses. 

In this paper we shall focus on these two-point action spaces for each of the genes, that 
is, those with Am = {0, 1}. Of interest therefore is to choose a vector of actions 

a=(ai,a2,...,aM)^e^ = {0,l}^ 

based on the observed "large M, small n" data set. For the mth gene there will be associated 
aOm & {0, 1}, which is unknown, representing the correct action to take, which unfortunately 
is unknown. Thus, for the M genes there will be an unknown vector 

6> = (^i,^2,...,M'ee> = {o,ir 

representing the vector of actions that ought to be taken. This 6 will be referred to as the 
state of reality. In light of this state of reality vector 0, a chosen action vector a will have 
consequences quantified through a loss. That is, there will be a mapping 

(a, 0) ^ L(a, 0) 

where L(a, 9) is the loss that is incurred with the action a when reality is 0. Such a loss 
must take into account the loss incurred when the action is a = 1 when reality is ^ = 0, 
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called a Type I error, as well as the loss incurred when the action is a = when reality is 

— 1, called a Type II error. There will be a variety of ways of measuring the overall Type 

1 and Type II errors in such multiple decision problem, which will be formally described in 
Section 2. 

These multiple decision problems appertaining to such "large M, small n" data sets 
therefore lend naturally to a decision-theoretic framework which will be discussed in more 
detail in Section 2. In addition to this decision-theoretic framework, we implement a Bayesian 
approach to decision-making by putting a prior probability on the unknown state of reality 
6. Coupled with the appropriate loss function, we obtain the Bayes multiple decision action. 
To achieve this, we derive the mathematical form of the Bayes multiple decision function, 
abbreviated BMDF, and describe an efficient computational implementation of this BMDF 
under varied scenarios in terms of loss functions and data structures. 

A decision-theoretic and a Bayesian approach to these multiple decision problems with 
high-dimensional data is certainly not new as can be seen from the papers by Miiller, Parmi- 
giani, Robert and Rousseau (2004), Scott and Berger (2006) and Sarkar, Zhou and Ghosh 
(2008). Other approaches include, but not limited to. Storey (2003), Sun and Cai (2007) and 
Pena, Habiger and Wu (2011). See also the monograph Efron (2010a). An innovative major 
contribution of this paper is the use of a general class of loss functions that encompasses 
many of the loss functions that have been used in earlier works. For instance, the general 
class of loss functions introduced in Section 2 includes as special cases those that involve 
false positives and false negatives as well as the commonly-used false discovery rates and 
false nondiscovcry rates. Another major contribution is an efficient algorithm for compu- 
tationally finding the Bayes multiple decision action, an algorithm that has computational 
order of at most O(M^logM). Many papers have dealt with the situation where the ob- 
servables from each of the genes are stochastically independent. We go beyond this usual 
assumption by incorporating dependencies among these observables, with the dependence 
structure induced by frailty-type models, which also takes the form of Archimedean copulas. 
This dependent modeling approach utilizes ideas from the survival analysis area. The sta- 
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tistical models governing these multiple decision problems possess more complications than 
the description above. This is so since, even though the parameter of main interest is the 
state of reality vector 0, there will be other unknown model parameters that are present 
which may be viewed as nuisance, and we need to be able to deal with these nuisance pa- 
rameters as well, both in constructing the Bayes multiple decision functions and in the prior 
probability specification. In our development of the BMDF we therefore first consider the 
situation of a simple null hypothesis versus simple alternative hypothesis setting wherein 
the distributional model for the random observable for the mth gene is completely known 
under either 6*^ = or 6*^ = 1; and then utilize the results for this setting to solve the 
problem for the composite versus composite setting where unknown nuisance parameters are 
present. An interesting development is the use of Sequential Monte Carlo (SMC) techniques 
to numerically approximate the Bayes multiple decision action especially in the presence of 
dependencies among the observables and also in the prior probability specification. 

We now outline the contents of this paper. In Section 2 we will introduce the mathemat- 
ical setting and elements of the multiple decision problem, including a general class of loss 
functions. Section 3 will demonstrate the general form of the BMDF along with a computa- 
tionally efficient algorithm of finding this BMDF in both simple and composite hypotheses 
testing settings. Section 4 will give the expressions of the BMDF under three concrete loss 
functions. We will introduce the frailty-based dependent data and models in Section 5, and 
in Section 6 we will discuss the computational aspects of the posterior calculations under the 
dependent models, and give some algorithms using Sequential Importance Sampling (SIS). 
In Section 7 we will illustrate the BMDF in some concrete multiple decision problems, and 
compare the performance with currently used procedures via simulation studies. We will 
also apply the BMDF to a subset of a microarray data set. We will conclude the paper in 
Section 8 with some remarks. 
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2. DECISION-THEORETIC ELEMENTS 
2.1 Multiple Decision Problem 

Let (Q, J^, V) be a basic statistical model with V being a collection of probability measures. 
For m = 1,2, . . . , M, where M > 1 is a known integer, let '■ i^,J^) — > iXm,Sm), 
where is some space and is an associated cr-field of subset, of A'^. Let X = 
{Xi,X2, . . . , Xm) '■ J^) — >■ {X, B), where X — (^^^^ X^ is the product sample space and 
B is the associated product (T-field. A reahzation X = x will be called a (sample) data. For 
any P & the induced joint probability measure of X is Q = PX^^, whereas the marginal 
probability measure of X^n is Qm — P^m^- Let Q — {PX~^ : P G P} denote the collection 
of all probabihty measures of X. Consider a mapping = i?2, ■ ■ ■ , '^mY : Q — > {0, 1}^, 
where -dm : Q -> {0, 1} only depends on the mth marginal probability measure. In essence, 
the parameter of main interest is 'd{Q) = 6 = {9i, 02, ■ ■ ■ , OmY, which takes values in the 
parameter space = {0, 1}^. Observe that Q can be decomposed via Q — 
where Qg = {Q E Q : = 0}yO £ O- Let a — (ai, 02, . . . , om)^ be an action 

in the action space A = {0,1}*^. Let L° : ^ x Q — ?> M be a loss function such that 
L°{a.,Q) = L(a,i?((5)), Va e A,yQ e Q, where L:>lx6^Misa loss function belonging 
to a class jC which will be discussed in Section 2.2. 

Let V{A) be the collection of all probability measures over A. A randomized decision 
function is a measurable function S : X ^ ^(^4) such that 5(x)(a) = (5(x)({a}) denotes the 
probability of choosing action a e ^ given data x. Therefore, to make a specific decision 
via 5 given data x, an element a e ^ will be chosen according to the probability mass 
function 5(x)(-). Let V denote the space of all such randomized decision functions. Note 
that a nonrandomized decision function S* : X ^ A, which maps x e X directly to an 
a e ^ belongs to V, since it is just a degenerate probability measure over A, that is, 
5(x)(A) = /(5(x) G A). 

The risk of a randomized decision function 5 e P for any Q & Q associated with a loss 
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L° is defined by 

Bee 

Now, assume that for any 6 e ©, is an identifiable parametric class given by 
Qe = {Qei' ; le) '■ lo ^ Tg}, where 7^ is a nuisance parameter. This implies that Q is 
an identifiable model with respect to the parameter (0, 70) which belongs to the enlarged 
parameter space ©° = Uegofi^} x Te]- Then, for any 5 and Q e Q, the risk function is 
given by R°{S, Q) = Eeee 19))I{'&{Q) = 6>, Q = <50(-, ; Te)), where i? : P x 0° ^ R 

is given by 

R{5,{e,^e)) = £;x^Q,(,^,)£;A^5(x)(.)i^(A,0) = / E ^(a,0)[5(x)(a)]Qe(dx;7e). 

Furthermore, the decomposition of Q becomes Q = l+Jege |<50(-; 76») '■ '&{Qe{-'ile)) = 
6,j0 e rgj. A prior probability measure on Q can be constructed by specifying a prior 
probability measure on (©°,(t(©°)), where a{Q°) is the cr-field generated by a semi- ring 
C = {{6} xCg : Cg e a{re), 6 e 0} with aiFg) being the cr-field on Fq. Define a probability 
measure 11* on C such that for any 6q E Q and any Csq G (T(r0Q), 

where n(-) is a probability measure on © and Pei-) is a probability measure on Fg. This 
induces, by Caratheodory's extension theorem, a prior probability measure n° on (0°, a{C) = 
(t(0°)). Since the elements of Q are identified by (0,7©), there is a one-to-one mapping h : 
©° — > Q with {6,^g) A Qe{-;^e)- Therefore, n° determines a prior measure on {Q,a{Q)), 
where cr(Q) — ha{Q°). The Bayes risk function of a randomized decision function 6 E T> ior 
a prior 11° is defined via 

ruo{6) = EQ^u^R°{6,Q)= [ I (0, 7e))P«(d7«)n(d0). 
A randomized decision function 5* is called a Bayes multiple decision function (BMDF) if 

5* — argmin rn°((^). 
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The multiple decision problem is to find the BMDF, which is the optimal procedure for 
choosing the M-dimensional action vector from the Bayesian decision theory perspective. 
More practically, there is the issue of finding the optimal action in a computationally efficient 
manner. 

2.2 Loss Functions 

The loss function L : ^ x G — )■ M quantifies the error committed for a pair (a, 0) of action 
and state of reality. We shall consider a class of cost-weighted loss functions with elements 

L(a, 6) = CoLo(a, 6) + C,L,{a, 6), (1) 

where Co > and Ci > are pre-determined costs for loss functions Lq and Li, respectively. 
The generic forms of Lq and Li are 

Lo(a,0) = K(a^l)go(a)]M/3o(0'l)ho(0)], 

Li(a,0) = K(a^l)gi(a)]^[/5i(0^1)hi(0)], 

where aj : R ^ R, : R ^ R, gj : A ^ A, and : O ^ O for j = 0, 1. We 
assume further that, for j = 0, 1, gj is Tj-invariant with respect to the sub-action space 
Ak = {si ^ A : aJl = k} ior k e M. = {0, 1, ... , M}, in the sense that there exists a 
mapping Tj : M. ^ M. associated with such that a e implies gj{a.) e ^r,(fe)- Examples 
of T include the identity mapping with To{k) = k and also Ti{k) = M — k. Then go with 
go(a) = a is ro-invariant, while gi with gi(a) = 1 — a is ri-invariant. With a\/b = max(a, b), 

some examples of loss functions Lq and Li on ^ x © are the False Positive Proportion (FP): 

a'^fl — 6) (1 — aVd 

(a,0) ^ ^ ^ — the False Negative Proportion (FN) (a,0) ^ - — — ; the False 

Discovery Proportion (FDP) {a.,6) ^ ^ — the Missed Discovery Proportion (MDP) 

fa Ij V 1 

(a, 6) \ T ' — ; and the False Nondiscovery Proportion (FNP) (a, 6) i-)- -—-^ — . 

^ ' (6>^1)V1 ^ , ; ((l-a)Tl)Vl 

Besides the aforementioned properties of Lq and Li, we also assume that Lq and Li pos- 
sess a complementarity property given by gi(a) = ao — Aigo(a), for some ao G ^ with Ai > 
0. This property describes a relation between Lq and Li which indicates that they are loss 



functions having complementary behaviors. For example, FP and FDP are proportions of 
false discoveries, where a discovery at the mth coordinate is having am = 1, whereas FN, 
MDP, and FNP are proportions of false nondiscoveries. In the sequel, we will consider the 
pairs (FP, FN), (FDP, MDP), and (FDP, FNP) for (Lo,Li) in the multiple decision prob- 
lem. The cost constants Co and Ci will generally be determined by the decision maker or 
subject matter specialist, and they reflect the consequences of false discoveries and false 
nondiscoveries. 

2.3 Multiple Testing Problems 

The simple-versus-simple multiple hypotheses testing problem is a particular case of this 
multiple decision problem. Suppose that the marginal probability distribution of Xm satis- 
tisfies Qm £ {QmQ,Qmi} with Qmo 7^ Qmi, ^ud the parameter vector is — {I{Qm — 
Qmij-ifn — 1,2,..., M). We may consider simultaneously the M pairs of simple versus sim- 
ple hypotheses Hmo ■ Qm = Qmo versus H„a : Qm = Qmi for m = 1,2, ...,M. In this 
case, for m = 1,2, ...,M, Om — 1(0) indicates whether Hmi is (not) true, and the action 
Qm — 1(0) means rejecting (not rejecting) H^o- 

Usually, independent BernouUi priors are assigned to 6. Let TTmo^TTmi G (0, 1) be such 
that TTjno + TTmi = 1 for m = 1, 2, . . . , M. The prior probability mass function tt on is 
specified by 

M 

<0) = n vry-vr^i/(^^„ G {0, 1}). (2) 

■m = l 

In this situation, the Bayes risk function of a randomized decision function 5 for a prior mass 
function tt of is r^(5) = Ee^^^ Ex^Qg Ea^5(X)(-) L{A,d), associated with a loss function 
L E where Qe is the joint probability function of X, given 6, and whose mth marginal 
distribution function is Qm = Qm6m- 

Suppose that Qm, the marginal distribution of Xm, is in a class of distributions Qm given 
by Qm = {Qm{- ; 7m, Cm) : 7m £ T^, ^ £ Assumc, for m = 1, 2, . . . , M, = F^o U F^i 
and r^o l~l = 0. Then Qm has two subclasses denoted by 

QmO = {Qm{- ; 7m, Cm) : 7m G Tj^O, C ^ -m} and Qmi = {Qm{- ] Im-, Cm) : 7m G T^l, C ^ -m}- 
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Consider the M pairs of composite hypotheses Hmo '■ Qm £ QmO versus H^i '■ Qm £ Qmi, 
for m = 1,2, . . . , M . Note that Fmo, Tmi, or could be the same for all m, though 
in general they may be different. Let 6 — {I{Qm £ Qmi),"^ = 1,2, . . . , M) — {I{'ym £ 
r„i),m = 1,2,. ..,M) e e = {0,1}^, 7 = (7i,72,---,7m) e T = ^f^^T^, and £ = 
(^1,^2, • • • ,Cm) e S = Also, for any e Q, let = iS)m=i^rnem- Then the 

extended parameter vector is 

(0,7,Oee°^ l+|{{0}xr, x5}, (3) 

6»ee 

where e G is the parameter of main interest in the multiple decision problem, while 
(7, e r X S are nuisance parameters. Note that the value of 7 determines the value of 6, 
but we only want to determine whether 7^ G Tmo or 7^ € Tmi rather than estimating the 
exact values of ■jrn.s- The parameter e and the action a e ^ have the same interpretation 
in this composite hypotheses testing problem as in the simple-vs-simple setting. 
Assume that the prior distribution on the enlarged parameter space 0° is 

M 
m=l 

where Pmo and Pmi are prior densities on F^o x S and r^i x S, respectively, and with 
Ti'mOj^mi £ {0)1} and TTmo + T^mi = 1- The Bayes risk function of a randomized decision 
function S for a prior density tt of (0,7, associated with a loss function L e £ is r^^lS) — 
-^(0,7,O~T-Ex~Qe(.;74)-£^A~5(x)()-^(A,0), wherc Qei-;^,^) is the joint probability function 
of X, given (0,7,^), and the marginal probability measures are Qm = Qm{.-',lm,im), "m- ~ 
1,2, ... , M. To indicate that the marginal G QmOm^ shall denote it by QraOmi.''^ Imi Cm)- 

3. BAYES MULTIPLE DECISION FUNCTIONS 
3.1 BMDF in Simple Hypotheses 

Let 7r(-) be a prior probability mass function of G 9. Then The Bayes risk function 
of 5 for the prior tt is given by r^{5) = EQ^^R{S,d) = Eq E:j(.\0 Ea^5{x.){-) L{-^,0) = 
£;x -E^ix -Ea~5(x)(.) -^(A, d) = £;x -Ea^5(x)(.) -Eeix L{A.: 0), where E^^ix is the expectation with 
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respect to the posterior distribution of given X. For a e ^ and X = x e A", define tfie 
posterior expected loss by, 

L(a,x) = £;0|x=xiv(a,6>), (5) 
and denote the optimal action when X = x by 

a*(x) = argmin L(a, x). (6) 

Then the BMDF is 

5*(X)(5) = /(a*(X) e 5), Be a{A). (7) 

Notice that 5* is degenerate at the nonrandomized decision function 5 : X ^ A with 
5(x) = a*(x), which implies that we may always choose the BMDF to be nonrandomized. 
It is also worth noting that finding the optimal action a*, and thus the BMDF 5*, via 
equation (6) involves searching for the minimizer of the function L among all 2^ elements 
of A. When M is relatively large, the searching order of 0(2^) would become practically 
infeasible. Furthermore, note that this computational problem does not yet include the 
problem of computing the posterior distribution of given X = x. 

The idea for obtaining a computationally efficient algorithm to find the optimal action is 
to first find the restricted optimal action over the sub-action space Ak = {si & A : 3^-1 — k} 
for each A; e Al, and then to find the optimal action among these restricted optimal actions. 
Before presenting the results, we first define some relevant quantities that will be used. With 
the notation of the loss L e £ described in Section 2.2, for k e let 

do(A;,x) = Coao{k)Ee\^=^ [/3o(0'l)ho(0)]; (8) 
di(A;,x) = C^a^{k)Ee\^=^ [/3i(0"l)hi(0)]; (9) 
e(A;, x) = do(A;, x) - A^di{k, x); (10) 

and let r(A;,x) = {r-m{k,ii), m = 1,2, . . . , M) be the rank vector of e(A;,x). Also, define 

M 

i/(A;,x) = aSdi(A;,x) + ^/(r^(A;,x) <ro(A;))e^(A;,x), (11) 
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where we recall that tq : — >■ is a mapping such that a G Ak implies go (a) G ^ro(fc)- 

The following theorem describes a computationally efficient algorithm for finding the 
optimal action vector. 

Theorem 1 For a multiple decision problem with loss function L & C and prior probability 
mass function tt on 0, letk*:X^M. be defined via 

A;*(x) = argmin i7(/c,x), x e A", 

where H : M x X ^ W is defined in (11). Then the BMDF is of the form (7) with a*(x) 
satisfying 

go(a*(x))= (^/{r^(r(x),x) <ro(r(x)}, m = 1, 2, . . . , . (12) 
The searching order for obtaining this BMDF is no more than O(M^logM). 

Proof: Associated with the loss function L, L defined in equation (5) has a specific form 
given by L(a,x) = X)]=o ^jK(^'^l)gj(^)]^ -^0|x=x [^j(0'^l)h^(0)]. Restricting a on Ak, 

L(a, x) = go(a)Mo(A;, x) + gi(a)Mi(A;, x) 

= ajdi(/i;, x) + go(a)^[do(/i;, x) - Aidi{k, x)] 
= aodi(/c,x) +go(a)'^e(/c,x), 

where do{k,x.), (ii(A;,x), and e(A;,x) are as defined in (8)-(10). 

Since for a e Ak, go(a)'^l = To{k), the optimal action on Ak, denoted by aj^(x), which 
minimizes I/(a, x) for a. & Ak, therefore satisfies 

go(a^(x)) = (^I{r,{k,yi)<ro{k)}J{r2{k,^)<ro{k)},...J{rM{k,^)<ro(k)}^ 

where we recall that, for m = 1,2, .. . ,M, r^(A;,x) is the rank of e^(/c,x) among the el- 
ements of e(/c,x), m = 1,2, ...,M. Thus L(a^(x),x) = aQdi(/c,x) + Y.m=i H''^m{k,x.) < 
To{k))e-m{k,^), which equals the function H{k,:K). Therefore, for the A;*(x) in the statement 
of Theorem 1, a^«^^)(x) minimizes -^(a, x) over all actions a e ^. The optimal action, given 
X = X, is therefore a*(x) = a-^*(x)(^)) which satisfies (12). 
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To see the computational order of the algorithm, for e A1, observe that to find a^(x), 
it is only necessary to know which are the To{k) smallest among all the elements of e(/c,x), 
and the order of the computational complexity is typically bounded by 0{M + To{k) logM) 
(Knuth 1973). Upon obtaining a^(x), one only needs to search the minimum of H(k,x) for 
k G A4. Therefore, the total searching order is 0{M + To{k) logM). The worst-case 

scenario is when To{k) = M, k = 0,1, . . . , M, which leads to an upper bound of the searching 
order of 0(M2 logM). □ 

Note that the searching order of 0(M^ log M) is a considerable improvement over 0{2^). 
This is due to the special form of the loss function and the nature of the parameter space, 
action space, and multiple decision function space. In a lot of cases, including the specific 
pairs of loss functions that we will discuss later in Section 4, the searching order may still 
be lower than 0{M'^ log M). 

Observe that in the BMDF described in Theorem 1, we need to obtain the posterior 
expectation of the form £'0|x=x [/3(^^l)h(0)]. Recall that X^, given 0, has the marginal 
distribution QmOm- We assume for now that the XmS, given 0, are independent for m — 
1,2,..., M. In general, we may also model the X^s to be dependent as will be discussed in 
Section 5, in which case the computation of the posterior expectation will be discussed in 
Section 6. Denote the density of QmOm QmOm- Suppose an independent prior distribution 
of the form described in (2) is used. Then, the posterior distribution of also makes ^^s 
independent, with 

nm{em\^) = y-QrnoJxJ for m = 1, 2, . . . , M. 

Therefore, 

M , s 

Notice that, in general, this is a sum of |6| = 2^^ terms. A particular case is £^(0|x) = 
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(E(^i|x), ^(^2|x), . . . , ^(^m|x)) where, for m = 1, 2, . . . , M, 



E{dm\^) = P{em = l|x) = 



It is worth pointing out that, in general, each component of the posterior expectation 
-^0|x [/5(^^l)h(0)] is needed to obtain the BMDF, so that each component of 6* may depend 
on all components of X. This makes the BMDF a compound decision function (Sun and 
Cai 2007). In essence the decision for the mth component borrows information from the 
other components, or as Efron (20106) mentioned, the decision makes use of direct evidence 
from the mth component of the data as well as indirect evidence from other components. 

3.2 BMDF in Composite Hypotheses 

Let TT be a prior density function on the enlarged parameter space ©° (see (3) on page 10). 
Then the Bayes risk of a randomized decision function S is given by 

r^(5) = -E'(6»,7,4) -£^x|(6»,7,$) -E'a~5(x)(-) L{A, 0) = £'x -Ea~5(x)(-) -E'eix L{A, 0). 

Observe that the final form of the Bayes risk is exactly the same as in the simple-vs-simple 
setting. This implies that the results in Theorem 1 apply directly. However, since the param- 
eter space where the prior distribution is defined is now enlarged, the posterior expectation 
£;0|x L{A, 0) in the Bayes risk, or Ee\^ [^(0^1)h(0)] in Theorem 1 is now taken with respect 
to the marginal posterior distribution of 0, given X = x. 

Assume the X^nS, given 6*, are independent, for m = 1,2,..., M. Denote the density of 
Qm{' 1 1fmi ^m), under H^Om^ by gme^(- ; 7m, Cm)- K an independent prior of the form in (4) is 
assigned, then the marginal posterior distribution of also makes ^^s independent, with 



jfi \ in J 

7rm(c^m|xj 



where 



Cm)dCmd7m- (13) 
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Therefore, 

M 



m V TfL / 

In particular, E{e\x.) = (^(^i|x), ^(^2|x), . . . , E(^m|x)) where, for m = 1, 2, . . . , M, 

-C'(,C'm|Xj = z 7 — z 7 r. 

Notice that the integral in qmOmixm) may not be in a closed form. Thus Monte Carlo 
techniques may be needed even in this independent setting. Similarly to the simple- vs-simple 
setting, the BMDF in this composite hypotheses setting is of a compound nature. 

4. SOME CONCRETE SITUATIONS 

4.1 FP and FN Loss Functions 
Consider the loss function 

L{FP,FN){^i = CqLfp{b-, 6) + CiLi7jv(a, 6) = Cq — — ^ + Ci- — — , 

where Lpp and Lpjsf are the false positive proportion and false negative proportion. It is 
clear that the optimal action minimizing L(^fp,fn) is ^*fp,fn) (x) = (( ^*{FP,FN)i'^))m,'>^ — 
1, 2, . . . , M), where, for m = 1, 2, . . . , M, 

The corresponding BMDF S* is such that S*{X.) — a*(X). This BMDF is of intuitive form 
in that the decision on each component is based only on E{9jn\^) — P{Om — l|x) and the 
threshold is just Co/(Co + C'i)- Note that, under M = 1, this is the Bayes test corresponding 
to a Co/C'i-loss function (Casella and Berger 2001). 

4.2 FDP and FNP Loss Function 
Consider the loss function 

L{FDP,FNP){a-, 0) = CqLfdp{^, 0) + CiLfnp{^, 0) = ^^J^riy^ + ((l^^^ay^iyVI ' 
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where Lfdp and Lfnp are the false discovery proportion and the false non-discovery pro- 
portion. Note that for this L ^ C, go(a) = a, ao = 1, Ai = 1, and To{k) = k. Let 
(0(1) (x), 0(2) (x), . . . , 0(M)(x)) denote the ordered vector associated with E{d\x). Then, in 
Theorem 1, we have 

ELi(1 -</'(M-m)(x)) , ^ Ej=fc+i<^(M-m)(x) 



k " M-k 

Letting A;*(x) = argmin^.^^ H{k,x), then, by Theorem 1, the optimal action is 

^iFDP,FNP){^) =(i"(r^(A;*(x),x) < /c*(x)), m = 1,2,...,M) 

= (/(rank(E(^„|x)) >M-k*{x) + l),m = 1,2,...,M). 

Notice that for all k = 1,2, .... M — 1, H{k, x) depends only on the ordered vector of 
£^(0|x), which means that in order to select A;*(x) we only need to sort E{d\ii.) once. Also, 
the optimal action only requires the rank vector of -^(^Ix) after A;*(x) has been obtained. 
Therefore, the searching order is reduced to O(MlogM). 

For this case, the posterior means of the 6'^'s are still the basis of decisions, but in 
contrast to the previous case, the decision at any particular component depends on all 
posterior means. One may conclude that the Bayes multiple decision function does not 
depend on the magnitudes of the E{9„i\y^), but rather only on their relative ranks. However, 
this is not the case since their magnitudes are actually needed to determine k* (x) . 

4.3 FDP and MDP Loss Functions 
Consider the loss function 

a'^(l — 0) (1 — slYO 

L(FDP,MDP){a; 0) = CoLoi(a, 0) + CiLii(a, 0) = ^o-^-j^y^ + Ci ^^^^^ ^ ^ , 

where Lqi and Ln are the false discovery proportion and the missed discovery proportion. 
Same as for the pair of FDP and FNP, go(a) = a, ao = 1, = 1, and To(/c) = k. For 
A; = 1,2,..., M, let 

e(A;,x) = ^£;(6>|x) + CiE ^ ^ 



k ' V(f^l) VI 
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and denote by (e(i)(/c, x), e(2){k, x), . . . , etM){k, x)) the ordered vector of e{k, x). Then 



Cl'E ((^|x) + Co - ELi e(M-.+i)(A;, x) if ^ ^ 



and 



F(x) 



if Co > maXk^M\{o} Yli=i ^(M-i+i) (/c, x) 

arg maXjtg^\{o} E!=i e(M-j+i) (/c, x) otherwise 



By Theorem 1, the optimal Bayes action is 

a*(x) = (/(rank(g„(A;*(x),x)) > M - k*{x) + 1), m = 1,2, . . . , M). 

Observe that a*(x) and A;*(x) depend on the values and ranks of em{k,:x.) for A; e Ai. 
The searching order in this case is O(M^logM). It can be shown that when the posterior 
probability distribution of 6 specifies independent components, the searching order is reduced 
to O(MlogM). 

5. DEPENDENT DATA STRUCTURE 
In section 3, formulas of the posterior expectations are given under the assumption that the 
Xm's, m = 1,2, ... , M, are independent. However in various situations this assumption is 
far from being realistic. Therefore, we consider a dependent data structure, namely in the 
general setting introduced in section 2.1, the distribution of X given (6,^0), denoted by 

In the simple hypotheses setting, the goal is to specify Qe, the joint probability distri- 
bution of X given 6, such that the marginal probability distribution of X^, is Qm — QmOm- 
Let Mo{9) ^ {m e {1,2,...,M}: 9m ^ 0} and Mi{0) = {m e {1, 2, . . . , M} : ^„ = 1}. 
Assume that Xmo{0) = {Xm'- ^ G A^o} is a collection of independent random vectors, and 
Xmi{0) = {Xm'- m e is a collection of possibly dependent vectors, and the collections 
Xmo{9) ^iid ^Mi{0) ^'^^ independent of each other. Moreover, borrowing survival analysis 
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ideas, we assume that the dependence structure of the collection X_\4-^(^g) is that induced by 
a frailty-based model. Specifically, assume that 



meMi(9) 



meMiie) 



for all Bm e Bm, Tn e M.i{0), where Qm's are some distributions on A'^'s, and the frailty 
Z & Z is assumed to have a distribution G. Therefore, the joint distribution of Xj^^^p-^ is 

Qe[ n [X^ e £J = / n [Qm{Bm)YG{dz), (14) 

for all Bjn e Bm, m e A4i{6). Recall that in the simple- vs-simple multiple hypotheses 
testing setting, under Hmi, X^ ~ Q^i marginally. So the distributions Qm, m G A4i{0), 
should be such that these conditions are satisfied. Let JCg be the Laplace transform of the 
distribution function G, that is, Caiu) = e-''^G{dz), Vm G M. Denote Mi = \Mi{d)\. The 
following result gives the joint distribution of the dependent collection Xj^^f^g^ in terms of the 
collection of the marginal distributions {Qmi ■ m E M.i{6)} under the assumed frailty-based 
model. 

Proposition 1 The frailty-based model described in (14) is an Mi- dimensional Archimedean 
copula Go such that 

Qe I n ^ = CG{Qmi{Bm),me Mm) 

\meMi{e) j 

for all Bm e Bm, where Gq'- [0, 1]^^ — >■ [0, 1] is defined via 

/Ml ^ \ 
Cg{ui, U2,..., UMi) = I XI jO,QiUm) j ■ 

Proof: According to the model, marginally for m e M.i{d) and all Bm £ Bm, 

Qml{Bm) = j^Qm{BmYG{dz) 

- j^exp { - ^{ - logQ^(5„)}}G(d^) = CG{-\ogQm{Bm)). 
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Thus Qm{Brn) = exp{- C^^ {Qml{B„i)) . So, 



.meMi{0) 



L 



m&Mi{e) 

exp - 



z G{dz) 



\meMi{0) 

To show Cg is an Archimedean Copula, it is sufficient to show that the function 
is a strict generator of a copula, which is that it is a continuous strictly decreasing convex 
function from [0, 1] to [0,oo] with Cg^{1) = and C^^{0) = oo (Nelsen 1999). But these are 
straight-forward to verify using properties of the Laplace transform. □ 

Thus, a frailty-induced dependent full data model is given by 

( fl [Xm e BM = i n QmoiBmU Cg [^ml (^m) , ^ G A1l(6>)], (15) 

for all Bm e Bm, rn — 1,2,...,M. Notice that the distribution function G may have 
nuisance parameters, say, G{-) = G{-,v), where v e T. In this case, to calculate the 
posterior expectations, a prior on T will also be needed. 

In the composite hypotheses testing setting, we are to specify Qe{-;^, £), the joint prob- 
ability distribution of X given (0,7,^), such that the marginal probability distribution of 
is Qm = Qm9mi''j1m,im)- The rcsult in Proposition 1 is easily extended to get 



Qe n [X„ e 5J;7,€ = Cg Qmi{Bm;7m,U),m e Mi{d) 
and the full data model is therefore given by 



M 



\m=\ 



Cm) CGlQmliBm; ),m e Mi{e)\ 

.m&Mo{9) 
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6. SEQUENTIAL MONTE CARLO 
The applicability of the algorithm of finding the Bayes optimal action in Theorem 1 is 
contingent on an efficient way of calculating the posterior expectations 



where H{0) takes the form (5{0^1)h{6) and vr(0|x) is the posterior probability mass function 
of 0, given data x. For example, the specific forms of the H function desired in case of FDP 
and MDP loss functions are H{d) — and H{0) — jjg^^- As pointed out in Sections 3 and 
5 Monte Carlo integration is needed for approximating the posterior expectations. However, 
in the regular Importance Sampling (IS) algorithms (Ripley 1987), as M, the dimension 
of 0, increases, the computational complexity of calculating the weights also increases. So 
it is important to consider a sequential application of the importance sampling methods 
(Gordon and Smith 1993). Notice that m = 1, 2, . . . , M does not necessarily represent time 
or positions in an ordered sequence, and that the proposed dependent data structure is not 
necessarily a state space model as what is usually the case in sequential importance sampling 
applications, but the technique provides a visual solution through m = 1, 2, . . . , M that deals 
with the dimensionality and monitors the efficiency of the sampling procedure. 

6.1 Simple Hypotheses 

Let TT be a prior probability mass function of 6. Under the dependent data model described 
in (15), the desired posterior expectation is given by 



where Qeish^) = [Yl„,(,Mo(e)^"io{xm)j CG{qmi{xm),m e Mi{6)), and qmo and g„,i are the 
density functions of Qmo and Qmi, respectively. Consider an independent data-adaptive 
trial probability mass function g oi given by g{0) — g{0\^) — nm=i where, 
for m = 1,2, . . . , M, gm{Om\xm) is the marginal posterior of 9m, given Xm = Xm- This is a 
Bernoulli distribution of form 



E{H{0)\y. = ^) = Y,H{0)Ti{0\^), 



I{^) = E{H{0)\^) 



^,^^H{0)n{0)Qe{dx) 



0, 



'm 



(16) 
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where TTmiQm) is the marginal prior probabiUty of 6m- Denote by Xi:;^ — (xi, 0:2, ... , Xm) and 
Gi:m = {61,62, ■■■ , 6m) for m = 1, 2, . . . , M. The joint prior distribution of 6 can be written 
as 7r(0) = Y^=i'^m{6m\Q\:m.-i), where 7ri(^i|0i:o) is the marginal distribution of ^i, and for 
m — 2,3,...,M, 'Km{6m\0\:m-i) IS the probability of 6m given Oi-.m-i under the joint prior 
71. Notice that if n is independent, then 'n'm{6m\0i:m-i) = T^m{6m)- However, dependent prior 
structures can also be constructed by the frailty-induced models similarly to the dependent 
data structure, in which case T^m{(^m\Oi:m-i) is not necessarily reduced to 7r^(^m)- 

Let q0^,^{-x.i,m) be the marginal density function of Xi-^, given 6i,m, under the dependent 
data structure proposed in (15). Since the dependent structure is induced by a frailty model, 
we have, for m = 2, 3, . . . , M, ^^^^^(xi.^) = qei..m{^m\^i:m-i) ■ qoi-.m-A^i-.m-i), where 



?0i:m(^l:m) 



QmoiXm) if 6'^ = 



/I \ 101:m\ l-m/ J 

CG{qtM),teM,{d,..m-,)) 
Thus the full density of the data has the sequential form qe{y^) — Y{^=iqei:m{'^m\'^\:m-i) 
with q'ei(a;i|xi:o) = 1. So there is a recursive formula for calculating the importance weight 
function, given by Wm{0i:m\^i:m) = Wm-iiOi:m-i\^i:m-i)um{Gi:m\^i:m), where the increment 

/rt I \ q0i:mi^m\^l:m-l)T<'{6m\&l:m-l) n 
Um{Ol:m\^l:m) = T^, ^ Satisfies 



7:{6m = O\0,..m-l) .^^^^Q 



l:m ) 0^ 



7^m{6m = 0) 

CG{qtl{Xt),t e Mljei.m)) _ n{6m = l|6>l:^-l) ^ ^ ^ ' 

CG{qa{xt),t e Mi{ei:m-i)) qml{Xm) '^m{6m 1) 

(17) 



Note that wm{6i:m\^i:m) = u]{6\:>c), the importance weight function. In summary, we can 
now sample particles and calculate the importance weights in a sequential manner. 

The fundamental difficulty of SIS is the degeneracy of the weights. For large values of 
M, the weights w'^^)(0|x), r = 1, 2, . . . , i?, are all close to except for one that is close to 
1, which would result in a poor estimate eventually. One of the solutions to this difficulty 
is through resampling, or using the so-called the bootstrap filter (Liu 2001), in which, after 
resampling, all the importance weights are set to 1/Rso that we make sure all particles make 
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important contributions to the MC estimate. But resampling at each m = 1, 2, . . . , M may 
be computationally expensive, so we would only resample whenever the empirical effective 
sample size is too low. Via this procedure, we can make sure that the weights will not 
diverge. 

Algorithm 1 (SMC in Simple Hypotheses) 

1. Fix a large integer R and a threshold p e (0, 1]. 

2. Iterate for m ^ 1,2, M . 

(a) For all r = 1,2, R, generate 6m independently from gm{-\xm) in (16), and set 

air) _ (air) nir)\ 

(b) Compute the increments Um{Oiln\^i:m) given in (17), and the importance weights 

win' = ^^(^SilXi.^) = Wm-l{O^ll^-l\yil■.m-l)Um{0lL\^l■.m)■ 

(c) Compute ESS = R/ Etii^^^^f- 

(d) If ESS < pR, normalize the weights, and resample, with replacement, R particles 
from {O^l^: r = 1, 2, . . . , i?} according to the normalized weights, and set all the 
weights to 1/R. 



3. /(x) = E{H{d)\yi) is approximated by /(x) 



M 



6.2 Composite Hypotheses 

Let TT be the prior probability function on the enlarged parameter space 0° described in (4) , 
and denote the independent Bernoulli marginal prior probability on © by ttq. Then, under 

the dependent data model described in (15), the desired posterior expectation is given by 

E06e^o(6')Q0(dx) 

where Qeldx) = (^HmeMoW ^rno(a;m)) Cq qmi{xm),m G Mi{d) , and qmo and Qmi are as 
defined in (13). Consider a data-adaptive trial density g on the enlarged parameter space, 
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given by g{e, 7, ^) = g{e, 7, ^ |x) = llm=i 9m{0m, 7m\xrn), where, for m = 1, 2, . . . , M, 

and for j = 0, 1, qmj{xm) = Qrujixm; 7nij{xni):imj{xm)): where ^mj{xm) and |mi(2^m) are some 
convenient estimates, for example, maximum likelihood or method-of-moments estimates, of 
7^ and ^m, under the marginal models ~ Qmj- Notice that if q is replaced by q, Qm 
would be equal to the marginal posterior of {9m, 1m, Cm) given Xm- Since qmj{xm) is an 
approximation of qmj{xm), gm also provides some guidance to the posterior distribution by 
making use of the data. Write the prior in (4) as 7r(0, 7,^) = Y[m=i''^rn{9m,1m,Cm), where 
for m = 1, 2, . . . , M, 7r^(6'„, 7^, Cm) = [T^moPmoilm, ^m)]^"^" [T^miPmiilm, Cm)Y'^ ■ Similarly to 
the simple- vs-simple case, the full density of data has a sequential form given by ge(x; 7, ^) = 

nm=l?0i:m(^m|Xl:^-i;7l:m>^l:m)> whcrC 



QOumi^ml^l-.m-l, 7l:m) ^1: 



qmo{-^rn, 1m, ^m) if 

CG{qti{xt;it,Ct),te MiiOi.m)) 



CG{qtl{Xt;jt,Ct),t e A^l(01:m_l)) 

So the recursive formula for calculating the importance weight function is 



if 9m = 1 



"Wmi^l-.ml^l-.m', Tl:m' ^l:m) ~ "^m-l (^l:m-l |Xl;m-l j ' €l:m-l)'^m(^l:m l^limj Tl:m' ^l;m)' 

where the increment satisfies 



■^ml^limlxiimi Tlim-l' ^l:m-l) 



^m) a n 

Z 7 ^ II = U 

qmO\Xm) 

CGiqti{xt;it,Ct),te Mi{Oi,,n)) .^^ 

qml (-^m) 



Therefore, the SIS algorithm is very similar to that for the simple- vs-simple case with the 
increment replaced by (18). 

7. ILLUSTRATION AND SIMULATIONS 

7.1 Composite Alternatives with Independent Gaussian Observations 

Assume Xi\9i are independent N(ij,0.,l), where /Xoi = and /Xij ~ A'"(0, cr^) for fixed a. 

Consider tests i^oi : /^i = vs Hu : /li ^ with independent prior {9i,iii) ~ (1 — 7r)I{9i — 
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0) +7r0(/ij; 0, a^)I{9i = 1), where tt is the fixed prior probabihty of the alternative hypotheses, 
and 4>{-; /i, a^) is the density function of a normal distribution with mean and variance a^. 
Then the posterior means are 

E(e \X)^ 7r0(X,;O,l + a^) 

^ (1 - n)<P{Xi; 0, 1) + 7r<P{Xi; 0, 1 + a^) ' 

We performed 1000 simulations with M — 12, true a — 4 and true proportion of alternatives 
TT = 0.5 for all three procedures. For both correct prior parameters with tt* = 0.5, a* — A and 
misspecified prior parameters with tt* = 0.7, a* = 10, the empirical FDP, FNP and MDP 
were calculated lor Co/Ci = 1, 2. The results in Table 1 compares the BMDF associated with 
the three pairs of loss functions to the Benjamini and Hochberg (BH) procedure (Benjamini 
and Hochberg 1995) with a false discovery rate (FDR) threshold 0.05. To find the BMDF 
associated with the (FDP, MDP) pair of loss functions, the posterior expectation E{6 / [O^lM 

1) |x) was calculated exactly by using a recursive formula and approximately by a Monte Carlo 
approximation. Note that the use of BH procedure in the simulation study is to enable 
comparison with the most commonly-used, albeit frequentist, multiple testing procedure. 
However, it worth mentioning that it is not totally fair to compare the BMDF to the BH 
procedure because they are designed under different criteria. In Table 1, the empirical 
risks of the three BMDF are comparable to those of the BH procedure when the cost ratio 
Co/Ci = 2, and with a smaller cost ratio (= 1), the empirical FDP become larger and 
the empirical FNP and MDP become smaller since with a smaller cost the BMDF sacrifices 
larger Type I error to achieve an optimal combined risks. With misspecified prior parameters, 
the combined empirical risks stay almost the same as those with the correctly specified prior 
parameters, but the Type I error FDP becomes smaller and the Type II error FNP and MDP 
become larger. This is because we specified a higher prior probability of the alternatives 
than the truth, which results in more discoveries. Finally, the exactly and approximately- 
calculated posterior expectations result in similar BMDF associated with the (FDP, MDP) 
pair of loss functions in terms of the empirical risks. Since the exact calculation is too 
computationally expensive for large M, in the following illustration we will utilize the Monte 
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Carlo approximated posterior expectations. 

[Table 1 about here.] 

7.2 Simple- vs- Simple with Dependent Exponential Observations 

Consider a situation where, for all m, Xmi, ^m2, • • • , ^mn are IID with Q„io = {EXP{Xjn) '■ 
Am = Ao}, and Qmi — {EXP{Xm) '■ = Ai}. A Gamma(K, frailty induces dependency 
among {X^ : m e Ali}. The independent Bernoulli prior with P{9m = 1) = tt is used. 
Data was generated under the true model parameters: M = 500, n = 30,7r = .30, Aq = 
1, Ai = .5,K = 2. To find the BMDF, a frailty-model with exponential marginals and a 
Gamma(«;, k) frailty is used. In the Sequential Monte Carlo, R = 1000 particles are used 
with TT* — .20, K* — 3. Results in one rephcate for different pairs of loss functions is shown in 
Table 2. The cost ratio used is Cq/Ci = 1. Notice that the performances of all three BMDF 
are satisfactory even under misspecifications of the prior probabilities of the alternative 
hypotheses and the hyperparameter in the distribution of the frailty. 

[Table 2 about here.] 

7.3 Two Group Composite Hypothesis with Independent Gaussian Observations 
Assume that = (X^i, X^i, ■ ■ ■ , Xmni,Ymi, ^m2, ■ ■ ■ , Yrnui) are independent with Xmi ~ 

N{flrnl,(^m) and Ymi ~ iV(/Xm2 , CT^) • Considcr tests ffom : f^ml = f^m2 vs Him ■ l^ml 7^ I^m2 

with independent Bernoulli prior on 6 with probability tt for the alternatives, and conjugate 
prior for nuisance parameters given by, form e A4i, ^mi,A^m2 ~ -^(1/^, fcocr^), cr"^ ~ 
Gamma(a,/3), and for m G A^o, /^mi = /^m2 ~ A^(^m, ^oo"m)) ~ Gamma(a,/3). True 
parameters used to generate data are M = 500, tt = 0.1, /cq = 200, o; 4, /3 = 4, 1/ = 20. 
We performed 1000 simulations each with correct prior parameters and misspecified prior 
parameters: tt* = 0.05, = 100, a* = 20 as well as empirically estimated other parameters 



via u = x^, P = S'^{xm){(y. — l)/(A;o-l- 1)). In order to stabilize the results when tt is small, we 
the three BMDF associated with different loss functions with 10 different cost ratios Cq/Ci 



used an adjusted version of MDP (AMDP) given by L(a, d) = t^^txTT" implemented 
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and BH procedure with 10 different FDR thresholds. Figure 1 and Figure 2 show graphs of 
three empirical risk pairs for all four procedures with correctly specified prior parameters and 
partially misspecified and partially empirically-estimated prior parameters, respectively. In 
Figure 1 with the correct prior specification, the empirical risk curves for all three BMDF are 
well below that for the BH procedure indicating better performance, except for the BMDF 
associated with (FDP, AMDP) pair of loss functions where the average FP loss is surprisingly 
high. This is because with small prior probabihty tt for the alternatives, the AMDP could 
be large for some simulated data, and when the cost ratio Cq/Ci is very small, the BMDF 
would rather sacrifice a large number of false positives to achieve the optimal combined risk 
of FDP and AMDP. In Figure 2, when the prior parameters are partially misspecified and 
partially empirically estimated, the results of all four procedures almost coincide. However, 
more study about the empirical Bayes ideas is needed in future research. 

[Figure 1 about here.] 

[Figure 2 about here.] 

7.4 A Real Microarray Data Analysis 

In a colon cancer tumor metastasis study conducted in the laboratory of Dr. Marge Pena 
at the University of South Carolina, expression levels for 41268 genes from mice tissues 
were obtained through a Agilent Technology microarray. For each gene, five replicates were 
obtained for a control group and five replicates for a metastatic group. For our illustration, 
we randomly selected 500 genes out of the 41268 on which to apply our BMDF. We assume 
the independent two-group Gaussian model, and use the partially empirically-estimated prior 
parameters described in Section 7.3. Cost ratios used for (FP, FN), (FDP, FNP) and (FDP, 
AMDP) loss function pairs are Cq/Ci = 3, 0.2, 2 respectively, and for BH procedure the FDR 
threshold is 0.05. These cost ratios and thresholds were chosen according to the simulation 
results in Section 7.3 in order for the four procedures to have similar empirical FDP. Out of 
the 500 genes, the BMDF associated with (FP, FN), (FDP, FNP) and (FDP, AMDP) loss 
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function pairs found 15, 14 and 14 genes differentially expressed across groups, respectively, 
and the BH procedure found 10. Figure 3 shows the mean expression level of the control 
group versus the treatment group of the discovered genes. Notice that all 10 genes discovered 
by the BH procedure were also discovered by the three BMDFs. 

[Figure 3 about here.] 

8. CONCLUDING REMARKS 

BMDF defines a class of multiple decision procedures that achieve optimality in the Bayesian 

framework associated with a general class of loss functions. The results in Theorem 1 describe 

the form of the BMDF and provide an efficient algorithm of finding it in multiple testing 

settings. Notice that the pairs of loss function are not limited to the ones described in Section 

(1 — aVd 

4. For example, the adjusted MDP given by L(a, 0) — „ may help stabilize the Bayes 

{6 + 1 

optimal actions when the prior probabilities of the alternative hypotheses iTm — Pr{9m = 1) 
are small. Also notice that the choice of the loss function pairs and the cost ratio should be 
pre-determined. 

The frailty-based dependent data structure is a class of ffexible models if the distribution 
of the frailty is modeled in a hierarchical manner with hyperparameters. Similarly, the prior 
distribution could also be dependent with frailty-based structures. Furthermore, the SMC 
could be easily implemented in the computations of the posterior expectations. However, not 
all dependent structures is frailty-based. Therefore, in real data analysis, model validation 
is needed to see the validity of the dependent structure. 

One possible extension of the research in this paper is in two-class prediction problems 
where the form of BMDF could be extended with the loss functions replaced by prediction 
errors. Other future studies include the extension to model selection problems, and also in 
the empirical Bayes approach for determining prior hyperparameter values. In particular, it 
is of interest to study whether the empirical Bayes procedures are equivalent in some sense 
to the non-Bayesian BH multiple decision functions and the procedure in Peiia et al. (2011). 
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Figure 1: Graphs of {FP, FN), {FDP, FNP), and {FDP, AMDP) for three BMDF 
with different loss functions when cost ratio varies and the BH procedure when the FDR 
threshold varies in a two-group composite hypothesis problem with independent Gaussian 
observations. Correct prior parameters are specified. 
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Figure 2: Graphs of (FP, FN), {FDP, FNP), and {FDP, AMDP) for three BMDF with 
different loss functions when cost ratio varies and the BH procedure when the FDR threshold 
varies in a two-group composite hypothesis problem with independent Gaussian observations. 
The prior parameters are partially misspecified and partially empirically estimated. 
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Figure 3: Mean expression level of the control group versus the treatment group for 500 genes 
in a microarray data set. Gray circles are non discoveries and black circles are discoveries. 
Top left: BMDF with (FP, FN) loss function pair; top right: BMDF with (FDP, AMD?) 
loss function pair; bottom left: BMDF with (FDP,FNP) loss function pair; bottom right: 
BH procedure. 
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Table 1: Average of empirical FDP, FNP and MDP in 1000 simulations in a composite 
multiple testing problem with independent Gaussian observations. True prior parameters 
are tt = 0.5 and cr = 4. 
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Table 2: Results in one simulation in an exponential multiple testing problem. 
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